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1.  Introduction. 

Consider  the  linear  regression  model 

Y  *  XB  +  e  ,  (1.1) 

where  Y  is  an  (n  *  1)  vector  of  observations  on  a  dependent  variable, 

X  is  an  (n  *  r)  nonstochastic  matrix  (we  assume  full  column  rank  for 
convenience)  of  observations  on  r  explanatory  variables,  6  is  an 
(r  x  1)  vector  of  regression  coefficients,  and  e  is  an  (n  x  1)  vector 
of  disturbances,  normally  distributed  with  zero  expectation  and 
covariance  matrix  o2I.  Estimators  of  B  improving  upon  the  least 
squares  estimator, equivalently  the  maximum  likelihood  estimator,  have 
been  extensively  discussed.  See,  for  example,  Judge  and  Bock  (1978). 
Improved  estimation  of  the  disturbance  variance  o2  seems  to  have  been 
generally  overlooked.  The  usual  estimator  of  o2,  (Y-Y)  (Y-Y)/(n-r) 
is  best  unbiased  but  is  inadmissible  under  squared  error  loss  (SEL), 
L(o  ,oz)  =  (o~  -  oz)  .  It  i6  immediately  dominated  by  the  best 
invariant  estimator  based  upon  the  error  sum  of  squares, 

(Y-Y)t(Y-Y)/(n-r+2) .  (1.2) 


However,  (1.2)  is  also  inadmissible.  In  the  sequel,  we  develop 


a  class  of  estimators  of  o2  which  arise  naturally  in  a  regression 
model  and  dominate  (1.2)  in  terms  of  the  mean  square  error  (MSE) , 
EL(o2,o2).  The  roots  of  this  problem  date  to  Stein  (1964),  who 


showed  that  if  X, . X  are  N(u,02), 

I  n 

invariant  estimator  of  o2  based  upon  S 
i.e. ,  for  fixed  p  =  p^, 

6(S,X)  =  minU:(X.-V0)2/(n+2) 


then  E(X^-X)  /(n+1),  the  best 
*  E(X^  -  X)^,  is  inadmissible, 


,  S(Xi-3()2/(n+i)} 
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dominates  the  best  invariant  estimator.  The  interpretation  is  that 
for  some  samples  the  best  invariant  estimator  is  too  large. 
Interestingly,  6(S,X)  does  not  dominate  E(X^-Uq)  /(n+2).  The  latter 
does  better  in  a  neighborhood  of  p^.  Brown  (1968)  subsequently 
extended  this  idea  to  a  different  class  of  dominating  estimators. 

For  further  reference,  see  Brewster  and  Zidek  (1974),  Strawderman 
(1974),  and  Gelfand  and  Dey  (1986).  The  key  point  is  that  (S,X)  is 
a  version  of  the  sufficient  statistic  for  this  problem,  that  X 
contains  information  about  a2,  and  that  estimators  based  upon  S  only 
will  be  inadmissible.  In  the  regression  model  (1.1),  a  version  of 
the  sufficient  statistic  is  (8=  (X'X)_1X'Y,  SSE  =  (Y  -  XB)'(Y  -  XB)) 
which  suggests  why  (1.2)  will  be  inadmissible.  Our  development  in 
extending  Stein's  work  is  briefly  alluded  to  in  Klotz,  Milton  and 
Zacks  (1969,  p.  1392)  in  the  context  of  variance  components.  In 
Section  2,  we  offer  a  general  development  of  dominating  ^estimators 
of  o2  for  this  type  of  problem.  In  Section  3,  we  look  specifically 
at  the  example  of  the  regression  model  in  (1.1).  We  note  that  these 
results  will  be  applicable  to  more  complicated  econometric  models. 


2 .  Development,  of  Dominating  Estimators. 

Suppose  we  observe  several  independent  chi-square  random  variables 

as  Sn  ^  o2x2  and  S.  ^  o2*2  ,  ,  i  =  l,...,p,  i.e.,  S_.  is  central 

0  n,.  l  An. ,  X.  ,r  ’0 

0  i  i 

chi-square  with  n^  as  d.f.  and  the  S^  are  noncentral  chi-square 

1  j 

with  n.  d.f.  and  noncentrality  parameter  X..  Define  c .  *  E  n.  +  2, 

j  1  J  .  i=0  1 

T.  =  c .  E  S.  and  finally 
3  3  i=0  1 


6.  B  min(TQ,Tj,. . . ,T^ ) ,  J  =  0,1 ( . . . (p» 


(2.1) 
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Then  we  have 


Theorem  2.1.  In  estimating  o2  under  squared  error  loss,  the 


following  holds: 


5  <<  6  <<  6n  <<  ...  <<  6  , 
0  12  p 


where  6.  <<  6.  means  6.  dominates  6., 
i  J  J  1 

Proof.  Since  the  MSE  of  6.  is  Eri2  (°^~<5  • )  =  o1*  E^(l  ~6.)  , 

J  J  J 

j  =  0,...,p,  without  loss  of  generality  we  take  o2  =  1.  We  may 

2 

consider  sJl^  'v  xn  +2L  where  L  ^  'v  Po(Xj  and  given  L^,...,L 

i  i 

the  S.,  i  =  0,...,p,  are  conditionally  independent.  Moreover,  the 

variables  U.  =  c.T./c.+1T.+1,  j  =  0,1,..., p-1,  are  also  conditionally 

independent  and  for  fixed  j,  Uq,...,Uj_^  are  conditionally  independent 

of  T.. 

J 


Thus  for  any  estimator  of  the  form 

Cjh(u0,ul,...;u..1)T. 
we  may  write  its  MSE  at  o2  =  1  as 
E(c.h(U(),U1,...,Uj_1)T.-l)2 
*  E[E{(h(U0,U1>...  .U^^c^T^-l)2!^,...  ,LJ}] 

j  j  2  , 

=  E  [(c  .-2+2  Z  L.)(c.+2  l  L . )E{h  (Un ,U. , . . . ,U .  .)  L. . L.) 

1  .  ,  i  i  .  ,  l  01  j-1  1  J 

J  i=I  J  i=l 

-  2(c  .-2+2  I  L.)E{h(U  ,U  ,U  ._. ) |L.  , . . . ,L  }]+  1 
J  isl  ^  u  x  j  i  i  j 

2  J 

(using  the  fact  that  given  , . .  .  ,L ^  ,  T^  -v  xc  _2+2E  L. 

3  i=l  1 


(2.2) 


E [{ (h(U 


ft,U.,...,U.  . )-(c  .+2  Z  L. )"1}2(c .~2+2  Z  L.)(c  +2  Z  L  ) 
01  J-1  J  i=l  x  J  i=l  1  J  i=l  1 


+  2(c.+2  I  L.)  ] 

j  .  ,  l 

J  i=l 


(2.3) 
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where  the  expectation  in  (2.3)  i6  over  the  U.  given  and  then  over 


From  (2.3)  we  see  that  replacing  h  by 

h*  *  min(h ,c . *) 

J 

yields  an  estimator  c.h*T.  which  dominates  c.hT.. 

J  J  J  J 

In  particular,  writing  in  the  form  (2.2),  we  have 

h(uQ,...,u._1)  *  e(u0 . u._2)uj_1 


(2.4) 


where 


6<U0 . "j-l’  ■ 


H  U./c  if  H  U.  <  c  /c  , ,  k  <  k' ,  and 

•  ,  1  K  •  .  X  K  K 

i=k  i=k 


n  U.  >  c.  ,/c  ,  k  <k, 
.  ,  ,  l  k  k 
i=k 


if  HU.  >  c./c.,,  k  *  l,2,...,j-2, 

•  *  X  —  K  1  ”1 
i=k  J 


i.e.,  B  =  n  U./c  =  T./c.  T.  .  if  T.  -  min  T.  ,  k  <  j-1 
i=k  3  3  K  l<i<i-r  1 


and  B  =  c .  -  if  T.  ,  =  min  T.. 

J  J  1 

Finally,  using  h*  in  (2.4),  we  obtain  c.h*T.  =  min(c .hT . ,T . ) 

J  J  J  j  J 

*  min(6.  -,T.)  =  6.  dominating  c.hT.  =  6. 

J"1  J  J  J  J  J~1 

Remark  2.1.  When  p  =  1 ,  we  obtain  the  fact  that 

^  =  min(SQ/(n0+2) ,  (S0+S1)/(nQ+n1+2)) 


dominates  Sg/(nQ+2),  which  includes  Stein's  (1964)  result.  We  note, 
possibly  counter,  to  one '6  intuition,  that  <5^  does  not  dominate 
(SQ+S^)/(ng+n^+2) .  In  particular,  when  the tnoncentrality  parameter 


6 


is  very  small,  the  MSE  of  6^  is  only  slightly  smaller  than  2/(n^+2) 
(see,  for  example,  Brown  (1968)),  while  that  of  (Sp+S^)/(nQ+n^+2)  is 
only  slightly  greater  than  2/(ng+n^+2).  Stated  another  way,  6^  is  too 
small  in  the  estimation  of  o2  when  is  small.  Figure  1  illustrates 
the  situation. 

More  generally,  the  estimator  6^  as  defined  in  (2.1)  dominates 

T_  but  not  T  , . . . ,T  . . 

0  1  J 

Remark  2.2.  Theorem  2.1  establishes  the  inadmissibility  of 
nontrivial  scale  preserving  estimators  of  the  form  (2.2).  It  can  be 
generalized  to  the  estimation  of  om  by  extending  the  discussion  in 
Gelfand  and  Dey  (1986).  We  omit  the  details  here. 

Next  we  state  as  Theorem  2.2,  an  extension  of  Theorem  2.1  for 

p  =  1.  The  extension  for  general  p  is  apparent. 

2  2 

Theorem  2.2.  Suppose  S„ 02x  ,  S,  ^  (o2+x2)x  and  S„,  S, 

are  independent.  Using  the  notation  of  Theorem  2.1,  in  estimating 
o2 ,  <5^  dominates  6^  under  SEL. 

i  ,  2 

Proof.  We  may  think  of  S.  as  arising  from  S.  |W  =  w..  ozx  , 

i  ii  i  i 

where  W^  'v  (t2/2o2)x^  ,  so  that  the  resulting  marginal  distribution  of 
2 

S,  is  (o2+t2)x  . 

1  nl 

Hence,  by  Theorem  2.1,  regardless  of  the  given  W^ ,  6q  <<  6^ 
whence  integrating  over  W^  yields  the  result. 

Remark  2.3.  Theorem  2.2  and  its  extension  to  general  p  finds 
immediate  application  to  the  estimation  of  the  error  variance  in 
balanced  variance  components  models.  See,  Klotz,  Milton,  and  Zacks 
(1969)  for  discussion  in  the  one-way  layout. 
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Remark  2.4.  As  in  Remark  2.2,  Theorem  2.2  can  be  generalized  to 

.  .  ,  o' 

the  estimation  of  o  . 

Returning  to  the  setting  of  Theorem  2.1,  let  a  *  (a1,...,a  )  be 

3  p 

CX  ~ 

any  permutation  of  the  integers  l,...,p  and  let  c.  *  n.  +  En  +2. 

,  j  .  i*=l  i 

Also,  define  T°  =  (c“)  (  E  S  +  S-),  where  as  before  Sn  ^  o2x  , 

J  J  i=i  °i  0  0  % 

2 

S  'v*  o2y  ,  and  S„  and  the  S  are  all  independent.  Finally, 
a .  An  ,X  0  a . 

l  a .  a .  l 

li 


define 


6a  =  min(T  ,t“,...,t“)  . 
J  J 


(2.6) 


Then  Theorem  2.1  implies  that  in  estimating  o2  under  SEL,  for  each  a 


<<  6?  <<  6°  <<  ...  <<  6°.  Hence  we  obtain  p!  estimators,  <5°  each 
0  1  2  p  p 

defined  by  a  permutation  of  l,...,p.  These  estimators  are  order 

dependent  (i.e.,  dependent  upon  the  specification  of  a  particular 

permutation)  and  a  natural  question  to  ask  is  how  to  combine  these 

to  construct  a  permutation  invariant  estimator.  A  first  thought  i6 

6*  =  min  6a.  The  discussion  in  Remark  2.1  shows  that  6*  will  be 
P 

"too  small"  and  will  not  dominate  -6°.  A  better  choice  will  be 

P 


6  =  (p!)_1E6“ 


(2.7) 


where  the  summation  is  over  all  permutations  of  l,...,p.  In  particular 
we  have 


Theorem  2.3.  If  the  MSE  of  6  is  constant,  say  m,  for  all  o, 

-  1  ~  P 


then  6°  <<  6  for  any  permutation  a  =  (a^,...,a  ). 
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Proof.  The  MSE  of  6  is 

E(6  -  l)2  =  E  t(p!  )_1  Z(<$a  -  l)]2 

P 

=  (p!)"1  m  +  (p!)‘2  Z  E(6a  -  1 ) ( 6a  *  -  1) 

a^a'  P  p 

*=  m  +  (p!)"2  Z  [E(6a  -  1 )  ( 6° '  -  1)  -  m]  <  m, 
afa '  P  P 

since  by  the  Cauchy- Schwartz  inequality 

E(6a  -  l)(6a’  -  1)  <-{E(6°  -  l)2  E(6°'  -  l)2}^  =  m. 

p  p  -  p  p 

This  completes  the  proof  of  the  theorem. 

3.  Application  to  Linear  Regression. 

In  this  section  we  will  use  the  improved  estimators  of  o2  as 

developed  in  Section  2,  in  a  linear  regression  context.  Consider  the 

linear  model  (1.1)  and  suppose  we  are  interested  in  testing  the 

2  2 

hypothesis  H'B  -  £•  Let  R^,  be  the  full  model  and  reduced  model 

error  sum  of  squares  respectively,  i.e., 

R2  =  min(Y  -  X8)'(Y  -  XB) 

6  . 

and 

R2  =  min  (Y  -  XB)'(Y  -  XB). 

H’B=C 

Then,  whether  or  not  the  hypothesis  is  true,  it  follows  that  (see,  e.g., 

2  2  2  2 

Rao  (1973)  for  details)  R-.  'v  o2x  and  is  independent  of  R,  -  R„ 

0  n-p  10 

2 

'v-  o2»  .  where  k  =  rank  (H)  and  ?>  is  the  resulting  noncentrality 

iC  )  A 

parameter.  Thus  from  Remark  2.1,  it  follows  that 

2  2 
R„  R, 

61  min(n-r+2  ’  n-r+k-25 


(3.1) 
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dominates  Rg/(n-r+2)  which  is  the  best  invariant  estimator  of  o2  of  the 
2 

rorm  cRq. 

The  estimator  3.1  can  also  be  viewed  a6  a  preliminary  test 
estimator  for  testing  the  null  hypothesis  that  H'B  *  £  vs.  the 
alternative  :  H'B  £.  For  a  definition  and  discussion  of  preliminary  test 
est imators ,  see  Judge  and  Bock  (1978). 

Let  us  now  consider  general  p.  We  presume  a  sequence  of  nested 
hypothesis  as  given  below: 


HQ  :  XB  t  W(X) 


with  dim(M(X))  *  r, 


H.  :  X6  cS.C  tf(X)  with  di n(s.)  =  k.^  i  =  l,...,p, 

where  Af(X)  denotes  the  linear  manifold  of  X  (i.e.,  the  vector  space 

generated  by  columns  of  X),  S,  c  S,  C  ...  C  S  and  k.  >  k„  >  . . .  >  k  . 

12  p  1  2  p 

Now  define 

R2  =  min  (Y  -  X6)'(Y  -  XS) ,  i  =  l,...,p, 

1  XBeS. 

l 

2 

and  again  let  R-  be  the  full  model  error  sum  of  squares.  Thus  it  follows 


,2  d2  2 
—  P  *\.  rr*- 


Si  m  Ri-1  ‘  Ri  '  0  1  *  1 . »• 


and  S/s  are  independent  and  also  independent  of  Sq  =  Rq  o^xn  where 

nn  =  n  -  r,  n,  =  r  -  k, ,  n.  *  k.  ,  -  k.,  i  =  2,....p.  Thus  we  are  in 
0  ’1  1  i  i-l  i  r 

the  framework  of  Section  2.  Applying  Theorem  2.1,  we  obtain  the  improved 
estimator  of  o2  as 


n-r+2  ’  n+2-kj  »  n+2-k2 


’  n+2-k 


(3.2) 


2 

Computation  .  (3.2)  should  present  no  problem  since  the  R^  are  obtained 
in  fitt>ng  the  nested  models. 

In  the  special  case  where  ve  are  looking  at  the  r  explanatory 
variables  individually,  we  have  r!  sequences  in  which  the  variables  can 
be  removed.  For  any  particular  sequence,  a, using  the  notation  of  (2.6) 
in  (3.2),  we  obta in 


min 


n-r+2  *  n-r+1 


a 

_ E 

’  n+2 


(3.3) 


In  obtaining  an  estimator  of  o2  we  then  have  two  possibilities. 

(i)  If,  on  the  basis  of  prior  experience  or  theoretical  grounds,  we 

have  a  particular  sequence  in  which  the  variables  are  to  be  entered, 

hence  removed,  then  this  sequence  provides  (3.3).  If,  however, 

this  sequence  arises  from  some  formal  variable  selection  procedure, 

then  a  i6  data  dependent  whence  the  resultant  (3.3)  does  not  meet 

the  assumptions  of  Theorem  2.1,  so  that  no  claims  can  be  made  for 

its  MSE  performance.  In  fact,  such  a  6°  is  nearly  min  6° 

P  a  P 

which  from  remarks  after  (2.6)  will  likely  be  "too  small." 

(ii)  Calculate  d  as  in  (2.7).  We  may  argue  that  the  MSE'6  of  the  6^ 

are  likely  to  be  close.  First  we  expect  that  each  of  the  6° 

P 

2 

achieves  small  improvement  in  MSE  over  R^/ (n-r+2)  (see,  e.g.,  Brown 
1968  for  6ome  empirical  evidence)  and  second,  the  sequence  of 
denominators  in  (3.3)  is  the  6ame  regardless  of  a.  Theorem  2.3 


thus  encourages  6. 
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